Goto

Collaborating Authors

 background rejection


An Evaluation of Representation Learning Methods in Particle Physics Foundation Models

Chen, Michael, Kansal, Raghav, Gandrakota, Abhijith, Hao, Zichun, Ngadiuba, Jennifer, Spiropulu, Maria

arXiv.org Artificial Intelligence

We present a systematic evaluation of representation learning objectives for particle physics within a unified framework. Our study employs a shared transformer-based particle-cloud encoder with standardized preprocessing, matched sampling, and a consistent evaluation protocol on a jet classification dataset. We compare contrastive (supervised and self-supervised), masked particle modeling, and generative reconstruction objectives under a common training regimen. In addition, we introduce targeted supervised architectural modifications that achieve state-of-the-art performance on benchmark evaluations. This controlled comparison isolates the contributions of the learning objective, highlights their respective strengths and limitations, and provides reproducible baselines. We position this work as a reference point for the future development of foundation models in particle physics, enabling more transparent and robust progress across the community.


TRANSIT your events into a new mass: Fast background interpolation for weakly-supervised anomaly searches

Oleksiyuk, Ivan, Voloshynovskiy, Svyatoslav, Golling, Tobias

arXiv.org Artificial Intelligence

We introduce a new model for conditional and continuous data morphing called TRansport Adversarial Network for Smooth InTerpolation (TRANSIT). We apply it to create a background data template for weakly-supervised searches at the LHC. The method smoothly transforms sideband events to match signal region mass distributions. We demonstrate the performance of TRANSIT using the LHC Olympics R\&D dataset. The model captures non-linear mass correlations of features and produces a template that offers a competitive anomaly sensitivity compared to state-of-the-art transport-based template generators. Moreover, the computational training time required for TRANSIT is an order of magnitude lower than that of competing deep learning methods. This makes it ideal for analyses that iterate over many signal regions and signal models. Unlike generative models, which must learn a full probability density distribution, i.e., the correlations between all the variables, the proposed transport model only has to learn a smooth conditional shift of the distribution. This allows for a simpler, more efficient residual architecture, enabling mass uncorrelated features to pass the network unchanged while the mass correlated features are adjusted accordingly. Furthermore, we show that the latent space of the model provides a set of mass decorrelated features useful for anomaly detection without background sculpting.


Improving new physics searches with diffusion models for event observables and jet constituents

Sengupta, Debajyoti, Leigh, Matthew, Raine, John Andrew, Klein, Samuel, Golling, Tobias

arXiv.org Artificial Intelligence

We introduce a new technique called Drapes to enhance the sensitivity in searches for new physics at the LHC. By training diffusion models on side-band data, we show how background templates for the signal region can be generated either directly from noise, or by partially applying the diffusion process to existing data. In the partial diffusion case, data can be drawn from side-band regions, with the inverse diffusion performed for new target conditional values, or from the signal region, preserving the distribution over the conditional property that defines the signal region. We apply this technique to the hunt for resonances using the LHCO di-jet dataset, and achieve state-of-the-art performance for background template generation using high level input features. We also show how Drapes can be applied to low level inputs with jet constituents, reducing the model dependence on the choice of input observables. Using jet constituents we can further improve sensitivity to the signal process, but observe a loss in performance where the signal significance before applying any selection is below 4$\sigma$.


Efficient and Robust Jet Tagging at the LHC with Knowledge Distillation

Liu, Ryan, Gandrakota, Abhijith, Ngadiuba, Jennifer, Spiropulu, Maria, Vlimant, Jean-Roch

arXiv.org Artificial Intelligence

The challenging environment of real-time data processing systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies that only models with low computational complexity that have weak inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models and the reduced computational complexity of small ones. In this paper, we present an implementation of knowledge distillation, demonstrating an overall boost in the student models' performance for the task of classifying jets at the LHC. Furthermore, by using a teacher model with a strong inductive bias of Lorentz symmetry, we show that we can induce the same inductive bias in the student model which leads to better robustness against arbitrary Lorentz boost.


Decorrelation using Optimal Transport

Algren, Malte, Raine, John Andrew, Golling, Tobias

arXiv.org Artificial Intelligence

Being able to decorrelate a feature space from protected attributes is an area of active research and study in ethics, fairness, and also natural sciences. We introduce a novel decorrelation method using Convex Neural Optimal Transport Solvers (Cnots) that is able to decorrelate a continuous feature space against protected attributes with optimal transport. We demonstrate how well it performs in the context of jet classification in high energy physics, where classifier scores are desired to be decorrelated from the mass of a jet. The decorrelation achieved in binary classification approaches the levels achieved by the state-of-the-art using conditional normalising flows. When moving to multiclass outputs the optimal transport approach performs significantly better than the state-of-the-art, suggesting substantial gains at decorrelating multidimensional feature spaces.


CURTAINs Flows For Flows: Constructing Unobserved Regions with Maximum Likelihood Estimation

Sengupta, Debajyoti, Klein, Samuel, Raine, John Andrew, Golling, Tobias

arXiv.org Artificial Intelligence

Model independent techniques for constructing background data templates using generative models have shown great promise for use in searches for new physics processes at the LHC. We introduce a major improvement to the CURTAINs method by training the conditional normalizing flow between two side-band regions using maximum likelihood estimation instead of an optimal transport loss. The new training objective improves the robustness and fidelity of the transformed data and is much faster and easier to train. We compare the performance against the previous approach and the current state of the art using the LHC Olympics anomaly detection dataset, where we see a significant improvement in sensitivity over the original CURTAINs method. Furthermore, CURTAINsF4F requires substantially less computational resources to cover a large number of signal regions than other fully data driven approaches. When using an efficient configuration, an order of magnitude more models can be trained in the same time required for ten signal regions, without a significant drop in performance.


Resonant Anomaly Detection with Multiple Reference Datasets

Chen, Mayee F., Nachman, Benjamin, Sala, Frederic

arXiv.org Artificial Intelligence

An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses.


Decorrelation with conditional normalizing flows

Klein, Samuel, Golling, Tobias

arXiv.org Artificial Intelligence

The sensitivity of many physics analyses can be enhanced by constructing discriminants that preferentially select signal events. Such discriminants become much more useful if they are uncorrelated with a set of protected attributes. In this paper we show that a normalizing flow conditioned on the protected attributes can be used to find a decorrelated representation for any discriminant. As a normalizing flow is invertible the separation power of the resulting discriminant will be unchanged at any fixed value of the protected attributes. We demonstrate the efficacy of our approach by building supervised jet taggers that produce almost no sculpting in the mass distribution of the background.


Jet Constituents for Deep Neural Network Based Top Quark Tagging

Pearkes, Jannicke, Fedorko, Wojciech, Lister, Alison, Gay, Colin

arXiv.org Machine Learning

Recent literature on deep neural networks for tagging of highly energetic jets resulting from top quark decays has focused on image based techniques or multivariate approaches using high-level jet substructure variables. Here, a sequential approach to this task is taken by using an ordered sequence of jet constituents as training inputs. Unlike the majority of previous approaches, this strategy does not result in a loss of information during pixelisation or the calculation of high level features. The jet classification method achieves a background rejection of 45 at a 50% efficiency operating point for reconstruction level jets with transverse momentum range of 600 to 2500 GeV and is insensitive to multiple proton-proton interactions at the levels expected throughout Run 2 of the LHC.